[SOUND]
This
lecture is about
the contextual text mining.
Contextual text mining
is related to multiple
kinds of knowledge that we mine from
text data, as I'm showing here.
It's related to topic mining because you
can make topics associated with context,
like time or location.
And similarly, we can make opinion
mining more contextualized,
making opinions connected to context.
It's related to text based prediction
because it allows us to combine non-text
data with text data to derive
sophisticated predictors for
the prediction problem.
So more specifically, why are we
interested in contextual text mining?
Well, that's first because text
often has rich context information.
And this can include direct context such
as meta-data, and also indirect context.
So, the direct context can grow
the meta-data such as time,
location, authors, and
source of the text data.
And they're almost always available to us.
Indirect context refers to additional
data related to the meta-data.
So for example, from office,
we can further obtain additional
context such as social network of
the author, or the author's age.
Such information is not in general
directly related to the text, yet
through the process, we can connect them.
There could be other text
data from the same source,
as this one through the other text can
be connected with this text as well.
So in general, any related data
can be regarded as context.
So there could be removed or
rated for context.
And so what's the use?
What is text context used for?
Well, context can be used to partition
text data in many interesting ways.
It can almost allow us to partition
text data in other ways as we need.
And this is very important
because this allows
us to do interesting comparative analyses.
It also in general,
provides meaning to the discovered topics,
if we associate the text with context.
So here's illustration of how context
can be regarded as interesting
ways of partitioning of text data.
So here I just showed some research
papers published in different years.
On different venues,
different conference names here listed on
the bottom like the SIGIR or ACL, etc.
Now such text data can be partitioned
in many interesting ways
because we have context.
So the context here just includes time and
the conference venues.
But perhaps we can include
some other variables as well.
But let's see how we can partition
this interesting of ways.
First, we can treat each
paper as a separate unit.
So in this case, a paper ID and the,
each paper has its own context.
It's independent.
But we can also treat all the papers
within 1998 as one group and
this is only possible because
of the availability of time.
And we can partition data in this way.
This would allow us to compare topics for
example, in different years.
Similarly, we can partition
the data based on the menus.
We can get all the SIGIR papers and
compare those papers with the rest.
Or compare SIGIR papers with KDD papers,
with ACL papers.
We can also partition the data to obtain
the papers written by authors in the U.S.,
and that of course,
uses additional context of the authors.
And this would allow us to then
compare such a subset with
another set of papers written
by also seen in other countries.
Or we can obtain a set of
papers about text mining, and
this can be compared with
papers about another topic.
And note that these
partitionings can be also
intersected with each other to generate
even more complicated partitions.
And so in general, this enables
discovery of knowledge associated with
different context as needed.
And in particular,
we can compare different contexts.
And this often gives us
a lot of useful knowledge.
For example, comparing topics over time,
we can see trends of topics.
Comparing topics in different
contexts can also reveal differences
about the two contexts.
So there are many interesting questions
that require contextual text mining.
Here I list some very specific ones.
For example, what topics have
been getting increasing attention
recently in data mining research?
Now to answer this question,
obviously we need to analyze
text in the context of time.
So time is context in this case.
Is there any difference in the responses
of people in different regions
to the event, to any event?
So this is a very broad
an answer to this question.
In this case of course,
location is the context.
What are the common research
interests of two researchers?
In this case, authors can be the context.
Is there any difference in the research
topics published by authors in the USA and
those outside?
Now in this case,
the context would include the authors and
their affiliation and location.
So this goes beyond just
the author himself or herself.
We need to look at the additional
information connected to the author.
Is there any difference in the opinions
of all the topics expressed on
one social network and another?
In this case, the social network of
authors and the topic can be a context.
Other topics in news data that
are correlated with sudden changes in
stock prices.
In this case, we can use a time series
such as stock prices as context.
What issues mattered in the 2012
presidential campaign, or
presidential election?
Now in this case,
time serves again as context.
So, as you can see,
the list can go on and on.
Basically, contextual text mining
can have many applications.
[MUSIC]

